keystep recognition
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Middle East > Israel (0.04)
- Research Report > Promising Solution (0.68)
- Instructional Material > Course Syllabus & Notes (0.43)
- Education > Educational Technology > Audio & Video (0.53)
- Education > Educational Technology > Media (0.43)
Video-Mined Task Graphs for Keystep Recognition in Instructional Videos
Procedural activity understanding requires perceiving human actions in terms of a broader task, where multiple keysteps are performed in sequence across a long video to reach a final goal state---such as the steps of a recipe or the steps of a DIY fix-it task. Prior work largely treats keystep recognition in isolation of this broader structure, or else rigidly confines keysteps to align with a particular sequential script. We propose discovering a task graph automatically from how-to videos to represent probabilistically how people tend to execute keysteps, then leverage this graph to regularize keystep recognition in novel videos. On multiple datasets of real-world instructional video, we show the impact: more reliable zero-shot keystep localization and improved video representation learning, exceeding the state of the art.
- Education > Educational Technology > Media (0.68)
- Education > Educational Technology > Audio & Video (0.68)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Middle East > Israel (0.04)
- Research Report > Promising Solution (0.68)
- Instructional Material > Course Syllabus & Notes (0.43)
- Education > Educational Technology > Audio & Video (0.53)
- Education > Educational Technology > Media (0.43)
Video-Mined Task Graphs for Keystep Recognition in Instructional Videos
Procedural activity understanding requires perceiving human actions in terms of a broader task, where multiple keysteps are performed in sequence across a long video to reach a final goal state---such as the steps of a recipe or the steps of a DIY fix-it task. Prior work largely treats keystep recognition in isolation of this broader structure, or else rigidly confines keysteps to align with a particular sequential script. We propose discovering a task graph automatically from how-to videos to represent probabilistically how people tend to execute keysteps, then leverage this graph to regularize keystep recognition in novel videos. On multiple datasets of real-world instructional video, we show the impact: more reliable zero-shot keystep localization and improved video representation learning, exceeding the state of the art.
- Education > Educational Technology > Media (0.66)
- Education > Educational Technology > Audio & Video (0.66)